Skip to content

fix(screentracker): prevent Send() from blocking when ReadyForInitialPrompt stays false#215

Merged
johnstcn merged 6 commits intofix/write-stabilize-non-fatal-phase1from
fix/send-blocks-initial-prompt-not-ready
Apr 9, 2026
Merged

fix(screentracker): prevent Send() from blocking when ReadyForInitialPrompt stays false#215
johnstcn merged 6 commits intofix/write-stabilize-non-fatal-phase1from
fix/send-blocks-initial-prompt-not-ready

Conversation

@johnstcn
Copy link
Copy Markdown
Member

@johnstcn johnstcn commented Apr 8, 2026

Fixes #209. Stacked on #208.

Changes

  • Make statusLocked() return "changing" when initialPromptReady is false, so Send() rejects with ErrMessageValidationChanging instead of blocking forever
  • Add (U+254C) as alternative to (U+2500) in findGreaterThanMessageBox / findGenericSlimMessageBox for Claude Code v2.1.87 onboarding screens
  • Add testdata fixture for the dashed-line Claude screen
  • Add TestSendRejectsWhenInitialPromptNotReady regression test
  • Update four existing tests that asserted stable when initialPromptReady was false
Implementation plan and decision log

Root cause

statusLocked() didn't check initialPromptReady. It returned "stable" when the screen was calm, but the stableSignal (which the send loop waits on) required initialPromptReady == true. If readiness never arrived, Send() blocked forever.

Key decisions

Decision Rationale
Return "changing" (not "initializing") Snapshot buffer is full (past initializing). "changing" matches ErrMessageValidationChanging.
Unconditional guard (not only when InitialPrompt configured) stableSignal gates ALL outbound messages on initialPromptReady, not just initial prompts. Status must reflect actual send capability.
U+254C only (not broader char set) This is the specific character seen in the wild. The statusLocked guard is the safety net for future unknown characters.
Update existing test assertions Four tests documented the buggy behavior (stable when not ready). Updated to expect changing.

🤖 Written by a Coder Agent. Will be reviewed by a human.

johnstcn added 3 commits April 8, 2026 16:45
…hed line

Claude Code v2.1.87 decided to get fancy with its onboarding screen,
using ╌ (BOX DRAWINGS LIGHT DOUBLE DASH HORIZONTAL) instead of the
classic ─ (BOX DRAWINGS LIGHT HORIZONTAL). Our detector stared at
this new character like a cat seeing a cucumber.

Add ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ as an alternative pattern in both
findGreaterThanMessageBox and findGenericSlimMessageBox.

Refs: #209
statusLocked() used to report 'stable' when the screen was calm,
blissfully unaware that initialPromptReady was false. Send() would
trust this optimistic status, enqueue a message, and then wait
forever for a stableSignal that would never arrive — like waiting
for a bus that got cancelled.

Return 'changing' when initialPromptReady is false so Send() rejects
immediately with ErrMessageValidationChanging instead of ghosting the
caller.

Fixes: #209
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

✅ Preview binaries are ready!

To test with modules: agentapi_version = "agentapi_215" or download from: https://github.com/coder/agentapi/releases/tag/agentapi_215

@johnstcn johnstcn self-assigned this Apr 9, 2026
Restore correct tab indentation in three test blocks that
picked up an extra tab during editing. Add trailing newline.
Update message_box.go doc comments to mention the ╌ variant.
Copy link
Copy Markdown
Member Author

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Deep review of PR #215 (5 reviewers: Test Auditor, Edge Case Analyst, Contract Auditor, Concurrency Reviewer, Style Reviewer).

Core fix is clean — statusLocked() now reflects actual Send() capability, and the concurrency model is correct (one-way latch under lock, no new hazards). Good regression test that directly exercises the bug scenario.

Formatting issues (extra tabs, missing newline, stale doc comments) were caught by the review and already fixed in 934e832. Two observations remain worth noting inline. No blocking findings remain.

// rejects with ErrMessageValidationChanging instead of blocking
// indefinitely on a stableSignal that will never fire.
if !c.initialPromptReady {
return ConversationStatusChanging
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obs ConversationStatusChanging now carries two distinct meanings: "screen content is actively changing" and "agent readiness detection hasn't fired." (Edge Case Analyst, Contract Auditor)

An external consumer watching /status cannot distinguish between transient screen instability (which resolves on its own) and permanent readiness failure (which may never resolve without user intervention).

This is strictly better than the old behavior (status said "stable" but Send() hung). Both changing and initializing map to running in the HTTP layer anyway, so the public API surface doesn't change. If a future feature needs to surface "agent not recognized" vs. "screen still settling," a new status value or a separate readiness field would be needed — but that's out of scope here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note Adding to this thread: when readiness never fires, the system now reports "running" indefinitely with no log line or error explaining why. The user's only recourse is inspecting the terminal directly. This is strictly better than the old hang, and the dual-meaning limitation is acknowledged as out of scope here. Noting for awareness since there's no planned follow-up or tracking issue. (Mafuuu)

🤖

func findGreaterThanMessageBox(lines []string) int {
for i := len(lines) - 1; i >= max(len(lines)-6, 0); i-- {
if strings.Contains(lines[i], ">") {
if i > 0 && strings.Contains(lines[i-1], "───────────────") {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obs findGenericSlimMessageBox accepts mixed border styles — e.g. ─── on top and ╌╌╌ on bottom would match. (Edge Case Analyst, Contract Auditor)

The top and bottom borders are checked independently with ||. In practice, Claude Code uses the same character for both borders, so this is defensive rather than problematic.

Agreed this is fine — more permissive detection reduces false negatives, and the narrow scan window (last 9 lines) plus the prompt-character requirement on the middle line make false positives unlikely. Worth knowing if a false-positive concern arises later.

@johnstcn johnstcn marked this pull request as ready for review April 9, 2026 12:58
@johnstcn johnstcn requested review from Copilot and mafredri and removed request for Copilot April 9, 2026 12:58
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct fix, well-researched. The root cause analysis is genuine: statusLocked() returning "stable" while stableSignal required initialPromptReady was a real semantic mismatch, and the guard closes the gap at the right layer. Belt-and-suspenders approach (detection fix + status guard) is the right call for this class of bug.

Three P3s, two notes. The headline finding is a test coverage gap in findGenericSlimMessageBox's dashed-border path, which two reviewers found independently. The statusLocked() guard and the findGreaterThanMessageBox path are well-tested.

As Hisoka put it: "The API-visible change ('running' instead of 'stable' when initialPromptReady is false) is correct: it was lying before."

Process note: the style-cleanup commit (934e832) fixes gofmt indentation and doc comments that should have been part of the earlier code commits, suggesting the initial commits were pushed without running the formatter.

🤖 This review was automatically generated with Coder Agents.

…t, drop plans

- Extract containsHorizontalBorder() to collapse 6 inline border
  checks into one — no more copy-paste miscounts on 15-char Unicode
  strings
- Add msg_slim_dashed_box.txt fixture with │ prompt to exercise
  findGenericSlimMessageBox's ╌ path (previously untested because
  the > fixture hit findGreaterThanMessageBox first)
- Rename screen → message in test lambda for consistency
- Remove docs/plans/ from the branch
Copilot AI review requested due to automatic review settings April 9, 2026 14:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a screentracker deadlock where Send() could block indefinitely when ReadyForInitialPrompt never becomes true, and improves Claude Code onboarding screen detection so readiness can be detected on dashed-line variants.

Changes:

  • Update statusLocked() to return "changing" while initialPromptReady is false, causing Send() to fail fast with ErrMessageValidationChanging instead of hanging.
  • Extend message-box detection to treat (U+254C) as a valid horizontal border character alongside (U+2500).
  • Add new Claude readiness fixtures and a regression test to prevent reintroducing the Send() hang.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
lib/screentracker/pty_conversation.go Aligns status reporting with send-loop readiness gating to avoid indefinite Send() blocking.
lib/screentracker/pty_conversation_test.go Updates readiness/status expectations and adds regression coverage for the hang.
lib/msgfmt/message_box.go Adds dashed-border support via a shared border-detection helper.
lib/msgfmt/testdata/initialization/claude/ready/msg_dashed_box.txt Fixture for Claude screen variant using borders and > prompt.
lib/msgfmt/testdata/initialization/claude/ready/msg_slim_dashed_box.txt Fixture for slim message box variant using borders.
Comments suppressed due to low confidence (1)

lib/screentracker/pty_conversation.go:586

  • The comment "Handle initial prompt readiness" is a bit misleading now that readiness is handled by the earlier !c.initialPromptReady guard. This block is really the general rule that status should be changing while there are outbound messages queued or a message is being sent; consider updating the comment to reflect the actual condition to avoid confusion during future changes.
	// Handle initial prompt readiness: report "changing" until the queue is drained
	// to avoid the status flipping "changing" -> "stable" -> "changing"
	if len(c.outboundQueue) > 0 || c.sendingMessage {
		return ConversationStatusChanging
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@johnstcn johnstcn requested a review from mafredri April 9, 2026 14:36
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three round 1 P3s addressed cleanly in cb5b5ce: containsHorizontalBorder() helper extracted, msg_slim_dashed_box.txt fixture covers findGenericSlimMessageBox's path, docs/plans/ removed, parameter renamed. Both notes resolved.

One new P3 (test naming), one note (comment wording). The fix itself is solid. Four reviewers independently verified the statusLocked() guard, fixture coverage, and lifecycle correctness.

🤖 This review was automatically generated with Coder Agents.

@johnstcn johnstcn merged commit 3cb783f into fix/write-stabilize-non-fatal-phase1 Apr 9, 2026
3 checks passed
@mafredri mafredri deleted the fix/send-blocks-initial-prompt-not-ready branch April 9, 2026 16:30
johnstcn added a commit that referenced this pull request Apr 10, 2026
… don't echo input (#208)

Fixes #123.

- Make Phase 1 (echo detection) of `writeStabilize` non-fatal on timeout -- proceed to Phase 2 instead of returning HTTP 500
- Guard non-fatal path with `errors.Is(err, util.WaitTimedOut)` so context cancellation still propagates
- Reduce Phase 1 timeout from 15s to 2s (terminal echo is near-instant)
- Extract `writeStabilizeEchoTimeout` and `writeStabilizeProcessTimeout` constants
- Log at Info level (not Warn) since non-echoing agents hit this on every message
- Add doc comment on `formatPaste` in `claude.go` documenting the ESC limitation with TUI selection prompts
- Add tests for non-echoing agents (reacts, unresponsive, context-cancelled)

## Known limitation

For TUI selection prompts (numbered/arrow-key lists), this fix eliminates the 500 but does **not** deliver the correct selection -- the `\x1b` (ESC) in bracketed paste cancels the selection widget. The correct approach is `MessageTypeRaw`. Documented via a comment on `formatPaste` in `lib/httpapi/claude.go`.

Also discovered a separate issue during smoke-testing: #209 (fixed in #215, included in this branch).

> 🤖 Written by a Coder Agent. Reviewed by several humans and a veritable army of robots.


Co-authored-by: 35C4n0r <70096901+35C4n0r@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants